Les Houches Lectures on Community Ecology: From Niche Theory to Statistical Mechanics

Ecosystems are among the most interesting and well-studied examples of self-organized complex systems. Community ecology, the study of how species interact with each other and the environment, has a rich tradition. Over the last few years, there has been a growing theoretical and experimental interest in these problems from the physics and quantitative biology communities. Here, we give an overview of community ecology, highlighting the deep connections between ecology and statistical physics. We start by introducing the two classes of mathematical models that have served as the workhorses of community ecology: Consumer Resource Models (CRM) and the generalized Lotka-Volterra models (GLV). We place a special emphasis on graphical methods and general principles. We then review recent works showing a deep and surprising connection between ecological dynamics and constrained optimization. We then shift our focus by analyzing these same models in “high-dimensions” (i.e. in the limit where the number of species and resources in the ecosystem becomes large) and discuss how such complex ecosystems can be analyzed using methods from the statistical physics of disordered systems such as the cavity method and Random Matrix Theory.


Introduction
Life is everywhere on our planet.No matter where one looks, from the boiling hot springs of Yellowstone to the glaciers of Antartica, one can find thriving ecosystems.Despite the enormous variation in temperature, energy supplies, and physical resources, in all these settings organisms interact with each other and their environments to self-organize and survive.Understanding this process is one of the fundamental goals of ecology.
Every ecosystem is different and shaped by the physical environments in which it functions.The organisms that can be found in the desert, where water is a scarce and precious resource, are very different from the kind of organisms that thrive in the rain forest where water is abundant.At a completely different scale, one finds microbial ecosystems -complex assemblages of microbes that metabolize and produce small molecules.Furthermore, far from being passive creatures in a fixed environment, organisms almost always transform the environments in which they live.The most dramatic example of such a transformation is the great oxidation event two billion years ago by the photosynthetic cynobacteria.
Understanding how ecosystems can assemble and function in light of all this complexity and difference has been a central strand of ecological research for the last century.These efforts have birthed an extremely rich theoretical and experimental tradition aimed at identifying the principles underlying community ecology.A central result of this work is the realization that despite all the innumerable differences between ecosystems -very different physical environments, the great heterogeneity in organism physiologies and traits -there exist a common set of core principles that fundamentally shape community assembly and function.In the language of physics, this is the statement ecosystems are shaped by universal processes.
These lectures introduce and build upon one of the most successful and rich theoretical paradigms for understanding ecosystems: niche-theory.As discussed below, niche-theory emphasizes the central role played by ecological competition and selection in shaping communities.Many of the ideas of niche theory have their origins in seminal papers from the 1960's by Richard Levins and Robert MacArthur that introduced many of the core ideas and mathematical models underlying niche theory [1][2][3][4].Since then, niche theory has been further developed and expanded upon by many ecologists giving rise to an extremely sophisticated and coherent framework for understanding community ecology.These ideas have been summarized in a number of wonderful books [5][6][7].For this reason, we are neither capable of, nor have we attempted to, give a comprehensive introduction to the ecological literature.
Instead, the focus of these lectures is on reformulating these ideas with the purpose of presenting them in a language and style more accessible to physicists.This process has often resulted in alternative derivations/formulations of key ideas.Throughout the review, we have tried our best to retain what we consider the best aspects of the theoretical ecology tradition -an emphasis on the use of graphical methods to produce robust theorems using realistic and generalizable models (see Levins' wonderful essay "The role of Model Building in Population Biology" [8]) -and combine this with the best of the physics pedagogical tradition.
The timing of the lectures is inspired by the exciting and fruitful developments over the last decade in the physics of living systems.This period has seen a renewed interest in nichetheory and community ecology from the viewpoint of statistical physics.These works have sought to extend and integrate many of the ideas of niche theory with techniques and tools from statistical physics, leading to rapid progress and a plethora of new insights.However, this literature is often scattered across journals in many different fields ( ecology, physics, microbial ecology, biology) making it difficult to formulate a coherent picture.For this reason we have made a concerted attempt to provide a concise yet coherent introduction to these ideas.
This renewed interest in ecology among physicists has been driven by big experimental advances.This has been especially true of in the microbial realm, where the adoption of quantitative experimental designs centered around the rapid progress in DNA sequencing has allowed for the ability to catalogue the species present in a microbial ecosystem on a scale and resolution that was unimaginable just twenty years ago.This wealth of data has revealed that microbial communities are extremely diverse with tens to thousands of different strains/species present in most ecosystems.This has raised a number of new and fundamen-

Response to Perturbations Ecosystem Function
Figure 1: The three core problems and their interrelations.
tal questions about the diversity, function, and structure of microbial ecosystems.To answers these questions, physicists and ecologists have also started to design quantitative experiments that probe how the physical and chemical environment shapes the structure and function of microbial communities [9][10][11][12][13][14][15][16][17][18].One of the fundamental insights that has resulted from these experiments are that microbes fundamentally transform the environments in which they livesuggesting that any theory of microbial ecosystems must account for environmental engineering and the strong coupling between organism and environment [19,20].
The immense diversity seen in real ecosystems combined with the success of alternative frameworks for understanding diversity seen in ecosystems such as neutral theory (see [21] and [22] for review directed at statistical physicists) has also resulted in a renewed interest understanding how niche theory can be adapted to diverse ecosystems with many species and resources.This has led to a rich body of recent works exploring niche-theory in the "highdimensional" limit.Much of this work draws inspiration from the rich literature in the statistical physics of disordered systems and random matrix theory.
Unfortunately, there are many interesting recent research directions that we have been unable to cover here.These include new graphical methods that build upon those presented here [23,24] , the very impressive new work on dynamics and chaos in high-dimensional ecosystems (including dynamical mean-field theories) [25][26][27][28][29][30][31], the extension of consumer resource models to the microbial ecosystems by including metabolism and metabolic crossfeeding [12,18,[32][33][34][35], and macroecological laws [36].It is our hope that these lectures can serve as a starting point for readers interested in exploring this literature.
Finally, we note in line with the tradition of the Les Houches lectures we have kept citations to a minimum.We apologize for the many works we have inevitably failed to cite.We wish to emphasize that these lectures reflect the personal perspective of the authors (Cui, Marsland, Mehta), developed over the last few years working in ecology.In no way should these lectures be taken to be a comprehensive or complete overview of the field.In line with this, many of the derivations of classic results presented in the low-dimensional chapter are new to the literature.However, we have found these derivations to be more intuitive and less mathematical and for this reason have included these in the lectures.

Fundamental questions
Many works in community ecology are concerned with three inter-related problems.The first of these is to understand and predict community structure.Community structure is a very broad concept that refers to any property of the communities that reside in an ecosystem [37][38][39][40].These include the richness of an ecosystem -the number of different species in a community , the diversity of a community -measures of species biodiversity that also accounts for differences in species abundances (e.g.Shannon entropy of species abundances in an ecosystem), and even the predicability of community assembly -will you end up with the same community in similar environments?
The second major problem in community ecology is to understand ecosystem function [41][42][43][44].Ecosystem function is usually defined as the physiochemical and biological processes that occur in an ecosystem that are necessary for maintaining life [45].Examples including processes that create biomass and energy, material transformations necessary for ecosystems to thrive (e.g nutrient recycling, biomass production), and often is used to to denote just about any property of an ecosystem that can help sustain human life.This vagueness in definition often leads to numerous debates, controversies, and disagreements.Chief among these is whether the notion of function is something imposed externally by humans or is it an internal property of the ecosystem itself.In our experience, in the literature both of these meanings of ecosystem function are commonly employed.
The third major reoccurring problem addressed in community ecology is understanding how ecosystem properties (including community structure and ecological function) respond to perturbations [46,47].Examples of the kind of perturbations that are of interest include changes to the physical environment (e.g.warming of temperatures), the response of the ecosystem to immigration/invasion by new organisms, or the effect temporal disturbances.These questions are often motivated by important practical questions related to the preservation and management of environments or human health (e.g. the effects of antibiotics on the gut microbiome -the collection of microbes that live inside the human gut).
A major goal of modern ecological research is to relate these three concepts [48].For example, a large line of recent research has been concerned with asking how biodiversity affects ecological function [45,49].Another interesting line of research seeks to understand how the structure of food webs shapes community structure and stability in response to perturbations [50].
Finally, we note that while many of these questions and ideas were originally formulated for macroscopic ecosystems such as forests or coral reefs, the last few years have seen an increased interest in using these ideas to understand microbial ecosystems [51][52][53][54].Microbes offer a particularly nice and tractable setting for exploring these problems since it is possible to directly measure community structure using DNA sequencing and the small size of these ecosystems makes it possible to tightly control the physical and biochemical environments.

Ecological Processes
In order to situate niche theory in the broader context of community ecology, it is worth briefly considering a very general model of community ecology that incorporates the four major ecological processes thought to be important for ecosystems: (i) ecological selection -the idea that organisms that are more suited to an environment will survive in an ecosystem; (ii) stochasticity -processes such as demographic noise that introduce randomness; (iii) dispersal -species live in local communities and can immigrate between these communities; and finally (iv) speciation -the process through which new species can emerge from evolution.In what follows, we will always focus on timescales over which speciation is negligible (although it can be included as a special kind of "dispersal" from an imaginary pool of possible phenotypes).
A very general set of stylized equations that incorporates the first three of these processes is illustrated in Fig. 2(a) and consists of a set of local patches, a, b, c, . . ., a set of species labeled by latin letters i whose population in patch a is given by N ia , and local environmental variables labeled by greek letters α whose value in patch a is given by E αa .Within each patch, the dynamics are described by the following set of differential equations: ) where λ i a b is the migration rate of species i from patch b to patch a, and ξ ia (t) is Gaussian white noise.
The reader should not dwell too much on the details of Equation ( 2).This equation is too general to be amenable to mathematical analysis.Instead, the goal of writing Eq. ( 2) is to get a sense of how one can represent the ecological processes discussed above mathematically.Notice that the g i contain the information about ecological selection since these functions encode how well a species will grow.The dependence of g i on N and E is the mathematical manifestation of the fact that ecological selection is inseparable from interactions with other species and the shared environment.The λ i ab term represents dispersal.This terms controls how species immigrate between patches.The noise term D i N i ξ i (t) models the effects of demographic stochasticity.The vector h αa represents the intrinsic behavior of the environment in the absence of other organisms, and is known as the supply vector in contemporary niche theory, where it usually refers to an external supply of consumable resources.The impact of the organisms on the environment is captured by the second term, where the per-capita impact of each species j on the environment is measured by an impact vector q jα .This impact vector is the mathematical manifestation of the fact that species fundamentally modify the environments in the which they live.
The major branches of ecological theory shown in Fig. 2 each emphasize different aspects of this framework.Consumer Resource Models (CRMs) -the main model underlying contemporary niche theory -and generalized Lotka-Volterra (GLV) models -the main models underlying modern coexistence theory -focus on interactions with the environment and with other species, respectively.Neutral theory considers a different limit, where selective differences among species are absent, and species distributions are driven by drift and dispersal.Despite the differences of focus, these models have some overlap: interactions with the environment in CRMs produce effective interactions between species and thus can be implicitly included in a GLV model.Both of these models reduce to neutral theory in the limit of vanishing selective differences (g i = g for all i).Recent work from statistical physics literature also suggests that niche and neutral theory are actually better thought of as theories for two different "ecological phases" -akin to a ferromagnet and paramagnet in spin systems -rather than complete theories themselves [55,56].
In the following sections, we give an overview of niche theory.Niche theory neglect drifts and dispersal and focus on the effects of selection.To build ecological intuitions, we start by considering "low-dimensional" ecosystems with 2-3 species and 2 environmental factors.We focus on the two most common types of mathematical models: generalized consumer resource models (CRMs) and generalized Lotka-Volterra models (GLVs).Whereas CRMs explicitly model the environment, GLVs model the environment implicitly through its effect on species-species interactions.Having build our ecological intuition in this low-dimensional set-  ting, we then consider "high-dimensional" ecology where both the number of species and resources are assumed to be large.In this setting, we show how ideas from the statistical physics of disordered systems can be adapted to ecology in order to understand complex ecosystems.

An introduction to consumer resource models
One of the most influential class of ecological models in theoretical ecology are consumer resource models (CRMs).CRMs assume that all interactions between species are mediated by the consumption of external resources.Mathematically, this means that the g i and q jα in Eq.
(1-2) are functions of E alone, and do not depend directly on the population sizes N. The only environmental factors considered in CRMs are consumable resources.For this reason, we use the more familiar notation R α instead of E α , where each R α represents the abundance of one type of resource in the local patch.As noted above, CRMs neglect drift and dispersal focusing on selection.With these restrictions, we obtain the following set of differential equations for the dynamics within a single patch: The elimination of direct interactions between species gives this class of models a bipartite structure, which is amenable to graphical representation, especially for ecosystem with two resources.Fig. 3 shows how the growth rate, impact vectors and supply can be represented in the two-dimensional space of resource abundances.The set of environmental states R where g i (R) = 0 comprise the zero net growth isocline (ZNGI), a line that separates states of negative growth rate from states of positive growth rate for each species.The impact q iα (R) for each species can be represented as a vector field in this space, as can the supply vector h α (R).
This representation immediately leads to geometric criteria for identifying fixed points.In two dimensions, there is an established set of three geometric conditions that together guarantee stable coexistence of two species in a large class of models [6,7,57]: 1.The ZNGI's of both species must intersect.2. The supply vector h α (evaluated at the intersection point) must lie within the cone spanned by the negative impact vectors −q iα (also evaluated at the intersection point).
3. The impact of each species must be biased relatively more towards the resource that has a relatively larger effect on its own growth rate.
The first condition immediately follows from requiring d N i /d t = 0 for all surviving species, and is valid in any dimension.This is the source of the Competitive Exclusion Principle, which says that at most M consumer species can stably coexist.The competitive exclusion principle is simply the statement that, generically, at most M ZGNIs can intersect at a single point in an M -dimensional space in the absence of additional constraints or degeneracies in growth rates.
The second condition guarantees that a set of positive population sizes N i can be found whose combined impact j N j q jα perfectly balances the supply h α , so that dR α /d t = 0. Rewriting the second equation in Eq. 4 with the left hand side set to zero gives Since species abundances must be positive N j > 0, the equation has the geometric interpretation that the the supply h α must lie within the cone spanned by the negative impact vectors.This condition is also valid in any dimension, and for any number of species, not just in M = S = 2.For the most commonly used models the supply vector can be regarded as pointing directly towards a "supply point," as depicted in Fig. 3.The supply point represents the steady-state resource abundances in the absence of consumers.The graphical representation is further simplified in such cases, since the full supply vector field no longer needs to be directly considered, and the satisfaction of the coexistence condition depends simply on whether the supply point (which is often modeled as a directly accessible control parameter) falls inside the coexistence cone.The third condition is the most subtle.The first two conditions establish that d N i /d t = dR α /d t = 0, but imply nothing about the stability of this fixed point.In two dimensions, Tilman showed that the stability of the fixed point in a large class of two-dimensional models can be determined by comparing the relative orientation of the impact vectors to the relative orientation of the ZNGI's (see also Appendix B) [57].
In the rest of this section, we illustrate this graphical approach and the conclusions that can be drawn from it using three canonical examples.See Appendix A for a systematic taxonomy of the main kinds of classic niche theory models, with additional examples.

MacArthur's Consumer Resource Model
We begin with MacArthur's Consumer Resource Model (MCRM), which was the first model developed within this framework, and remains a fundamental reference point for ecological theory ( [4,58]).This model describes competition for "substitutable" resources using simple mass-action kinetics.Individuals of consumer species i encounter and consume units of resource α at a rate c iα R α , where the c iα are constants expressing the preferences of species i for the different types.Resource α carries an energy density w α .Some amount of energy per individual m i is required to sustain the population at steady state, and excess energy leads to population growth, with an efficiency factor e i converting from energy intake to growth rate.Finally, the resources themselves are self-renewing, such that in the absence of consumers, each resource type α follows its own independent logistic growth law with carrying capacity K α and low-density growth rate r α .Putting this all together, we have: Graphical representations of this model for two or three species and two resources are shown in Fig. 3(a,b).In sketching these figures, we have used the fact that the equilibrium condition dR α /d t = 0 for each resource can be arbitrarily rescaled without affecting the results of the analysis.The standard practice in the literature is to multiply both sides by w α /R α , yielding: This can equivalently be viewed as the equilibrium condition for a model with supply vector which points directly at the supply point K α , and impact vector q iα = −w α c iα (10) which is minus the gradient ∂ g i /∂ R α of the growth rate.This rescaling is extremely convenient for the graphical representation because the impact vectors are always perpendicular to the ZNGI's, and the coexistence condition reduces to the requirement that the supply point fall inside the (negative) cone spanned by these vectors.If the supply point falls outside this cone but above at least one of the ZNGI's, then one of the species will persist, driving the other one extinct.As shown in Fig. 3(a), this allows us to divide up the whole resource space into three regions based on the outcome of competition when the supply point is placed within the region: the "coexistence cone" where both species persist and two regions where one of the species competitively excludes the other.Note that the third criterion for stable coexistence above is always satisfied in this model, because the angle of the impact vector is tied to the angle of the ZNGI, and so the equilibrium point is always linearly stable.Fig. 3(b) illustrates some additional features of this model that follow from the graphical representation.In the figure, we have drawn ZNGI's for three species, resulting in two possible coexistence cones.Notice as expected from the competitive exclusion principle, at most two of the species can coexist at anytime.The first thing to note is that the third intersection, between the orange and green species, is not a possible coexistence point, because the blue species has a positive growth rate there, and will always re-invade.The second feature is the presence of two cones at the intersection of the inner ZNGI's with the resource axes.The existence of these cones does not depend on the number of species, but they were omitted from panel (a) for simplicity, since they do not affect the outcome of competition.When the supply point lies inside one of these cones, one of the resources is driven to extinction, and the equilibrium state lies at the intersection of the ZNGI with the resource axis.
When these cones are included in the diagram, then the competitive exclusion limit is saturated whenever the supply point falls in any cone.Although only one species survives in the boundary cones, the number of species still equals the number of persisting resources.This leads us to the final interesting feature of the diagram, which is the fact that the inner boundaries of neighboring cones are always parallel.This follows immediately from the fact that the impact vectors are perpendicular to the ZNGI's.It implies that the cones cover the full 90-degree angle of the quadrant of positive resource abundances.As the magnitude of the supply grows (i.e., the distance of the supply point from the origin), it therefore requires increasingly precise fine-tuning of the supply point to avoid landing in one of the cones, with the result that the competitive exclusion bound is generically saturated in the limit of large supply.

Externally supplied resources
The logistic growth of the resources in the absence of consumers makes the MCRM a reasonable model of competition for substitutable biotic resources, such as for herbivores competing for plants.But if the resources are externally supplied nutrients, a different model is needed.For example, in microbial ecology, there can be competition for various organic molecules that can be used as carbon sources.In such cases, a better model for the supply vector is a simple linear law h α = κ α − ω α R α such as would arise from an ideal chemostat with incoming nutrient flux κ α and dilution rate ω α .With this supply, we obtain: In this case, the true supply vector already points directly towards the supply point κ α /ω α .Notice that the impact vector q iα = −c iα R α depends on the resource concentration and is no longer perpendicular to the ZNGI.The factor of R α rotates it towards the most abundant resource, compressing all the supply cones towards the resource axes as shown in Fig. 3(c).
The first consequence of this modified cone geometry is that the extinct resource cones disappear.When R α = 0 for one of the resources, the impact vector becomes parallel with the other resource axis, shrinking the angle spanned by this cone to zero.The fact that no equilibrium state can have R α = 0 for any α can also be seen by direct inspection of the Eq. ( 12), where dR α /d t = κ α > 0 at zero concentration of resource α.The second consequence is that adjacent coexistence cones no longer have parallel boundaries.This means that a finite fraction of possible supply points remain outside all the cones even in the limit of infinite total supply.In the high-dimensional setting, this prevents random high-dimensional ecosystems with linear resource dynamics from saturating the competitive exclusion bound [59].

Essential resources
Both of the previous examples assumed that resources were substitutable, and that the same growth rate could be obtained with any combination of resources with the same combined energy content.But many of the resources required by organisms play distinct functional roles, all of which must be filled for the organism to survive.Plants, for example, require sufficient supplies of carbon, nitrogen and phosphorous to be present in the soil.If any one of the three is too scarce, the plant will not survive, regardless of the available quantities of the other two.This phenomenon implies that the ZNGI's do not intersect with any of the axes, and must have the roughly hyperbolic shape sketched in Fig. 3(d).Such ZNGI's can be obtained from a model of the form: with n < −1.In the limit n → −∞, this model reduces to Liebig's Law of the Minimum, where the growth rate of each species only depends on the concentration of the most limiting resource: We have chosen the impact vector in the standard way, in which the nonlinear term is treated as a total resource uptake rate, which is partitioned among the available resources according to the fixed biomass stoichiometry of the species v iα [60].
In this model, the direction of the impact vector is set by the biomass stoichiometry v iα , regardless of the resource concentration and of the parameters c iα and n that define the shape of the ZNGI.This means that the third criterion for stable coexistence is not necessarily satisfied, and fixed points can exist that are linearly unstable.Panels (e) and (f) of Fig. 3 represent two possible choices of biomass stoichiometry that lead to the same coexistence cone, with only the first one leading to stable coexistence.
The existence of different kinds of fixed points in this model opens the door to more complex attractors.For example, limit cycles, heteroclinic cycles, and chaos have all been observed in models of this basic type when M , S ≥ 3 [61][62][63].

Generalized Lotka-Volterra Theory
In the introduction to these lectures, we noted that ecological theory considers two kinds of interactions that affect an organism's growth rate: interactions with the environment and interactions with other species.In the previous section, we saw how CRMs focus on the first kind of interaction.Historically, the earliest attempts at mathematical modeling of ecosystems focused on the second kind of interactions, reducing the role of the environment to a mediator of species-species interactions [64][65][66].The most influential model for understanding speciesspecies interactions is the Lotka-Volterra model.This model was first developed by Lotka and Volterra to describe predator-prey oscillations, and later extended to more general scenarios of interactions among larger collections of species [55,64,67].
In this section, we derive the multi-species (or "generalized") Lotka-Volterra equations (GLV) as a limit of the framework introduced in eq.(1-2).We then review how these equations have been used to analyze coexistence in two-species communities.Finally, we provide additional intuition for these results and point out potential complications by applying them to the effective interactions that emerge from the consumer resource models (CRMs) studied above.We will see that "higher-order" interactions generically arise from resource-explicit models, and that often, the strength of interaction between a pair of species becomes contextdependent.

Deriving the GLV Equations
From the physicist's perspective, the GLV equations are analogous to elasticity theory.We take a complex nonlinear system and describe it in terms of the phenomenological linear-response parameters that characterize its behavior near a stable fixed point.This can be straightforwardly carried out in the case of a fixed environment E = const.Expanding the growth rate g i to linear order in the perturbation N − N from a fixed point N, we obtain: where g i and its derivatives are evaluated at N = N.If all species are present at nonzero abundance then we know g i ( N) = 0.But the g i is included in this expression to allow for expansion about fixed points where N i = 0 for some i.This will be required in the invasion analysis below.
If this were the exact model, with no higher-order terms, the intrinsic growth rate r i of species i would be found by extrapolating to N j = 0 for all j in the sum, yielding: The effect of other organisms on the growth rate is given by the "competition coefficients" α i j , equal to In terms of these parameters, the local dynamics are This is the canonical GLV equation.Since the derivatives that define r i and α i j are evaluated at N = N, they will in general take on different values at different fixed points, including when different assemblages of species are present.The GLV framework is frequently employed to model environmentally mediated interactions as well as direct interactions.Doing this requires somehow writing the environmental state as a function of the population sizes E(N).Once this is achieved, the growth rate g i (N, E(N)) is again a function of the population sizes alone, and the GLV parameters are readily obtained using the chain rule: If the environmental dynamics have a stable fixed point for arbitrary fixed N, this fixed point defines a function E(N) that can be used in the above definitions.But the resulting GLV dynamics do not necessarily match the original dynamics, even in the vicinity of the fixed point [68].

Invasion analysis and Modern Coexistence Theory
One of the main uses to which the GLV equations have been put is the derivation of conditions for species coexistence.A categorization of the main factors affecting coexistence has been developed with the help of this framework, and is known as Modern Coexistence Theory (MCT) [37,69,70].Instead of studying the properties of fixed points, as we did with niche theory above, MCT obtains sufficient conditions for coexistence in terms of the ability of each species to invade the rest of the community.While this approach has some limitations, which will emerge in the examples below, it has the distinct advantage of not depending on how well the GLV equations approximate the dynamics.The analysis is conducted entirely in terms of the sign of the growth rate g i of the hypothetical invader at a fixed point of the resident community, for which the GLV equations provide an exact expression.
The central results of this theory concern the interaction of two species [37,69].We start by computing the equilibrium abundance of each species i when it is the only species present: where the / j in the subscript indicates that these quantities are being evaluated at the fixed point where the other species j is absent.If we now imagine introducing species j at vanishingly small density, its initial per-capita growth rate is: For species j to successfully invade, this quantity must be positive, which implies: The two species i and j are guaranteed to stably coexist when each one is capable of invading the other, that is, when: This is a sufficient condition for coexistence of two species in any model within our framework, as long as the relationship E(N) can be defined.It is not a necessary condition, because it is possible for a species that fails to invade at low density to successfully invade at high density.
But it does facilitate a robust classification of factors that promote coexistence.This classification is normally formulated by distinguishing two contributions to the expressions that appear in the inequality.This distinction implicitly assumes that the GLV parameters are independent of which species is resident, so that the / j notation can be dropped from eq. (26).The quantities r i , r j can then be interpreted as estimates of the intrinsic "fitness" of the each species, and the same fitness ratio r j /r i appears on both sides of (26).The expression can therefore be rearranged to give: Factors that tend to reduce fitness differences (making r i /r j closer to 1) are known as "equalizing mechanisms" of coexistence.But the coexistence condition (27) shows that equalizing mechanisms are insufficient to achieve coexistence, because even as the ratio approaches 1, the ratios of competition coefficients still remain.We see that coexistence requires at least one of the species to compete with itself more than it competes with the other species (i.e., α ii > α i j ).Factors that increase competition among members of the same species relative to competition with the other species are known as "stabilizing mechanisms" of coexistence.
In the following sections, we present the GLV approximations to the CRMs presented in Section 4 above.In the original MCRM (eq.6-7), the approximation is exact, and stabilizing mechanisms have a mechanistic interpretation in terms of decreased niche overlap.When the resources are externally supplied (eq.11-12), the GLV parameters become context-dependent, and the equalizing and stabilizing mechanisms can no longer be cleanly separated as in eq. ( 27).

MacArthur's Consumer Resource Model
Additional intuition about the operation of stabilizing mechanisms comes from analyzing the GLV approximation to the MCRM [4,58].We will consider a slightly generalized version of the model, allowing depletion rates d iα to be independent of the growth law parameters c iα : Following the procedure outlined above, we write the resource abundances as a function of the population sizes by setting dR α /d t = 0.This equation has two solutions, R α = 0, and If the term in brackets is negative, then dR α /d t < 0 for all positive R α , and so R α = 0 is the stable solution.Otherwise, R α = 0 is unstable, because a small addition of the resource will lead to a positive growth rate.The stable equilibrium state is therefore given by Inserting this into eqs.(21), we obtain an effective GLV of the form: with and where M * represents the set of resources with nonzero abundance.Eq. 33 has a simple interpretation.The intrinsic growth rate r i is the difference between the maximum consumption capacity β∈M * w β c iβ K β at the given resource supply levels and the maintenance cost m i .
The formula for the competition matrix in eq. ( 34) carries an ecologically significant meaning, showing how the strength of competition is determined by the overlap between the requirements c iα of one species and the impacts d jα of the other.The overlap is weighted by a product of three factors: (a) the value w α of each resource, with more energy-rich resources counting for more in the sum, (b) the low-density growth timescale 1/r α for each resource, with rapid renewal decreasing the significance of the resource as a locus of competition and (c) the resource carrying capacity K α , giving resources with higher carrying capacity a greater contribution to the interaction.
Note that if no resources are driven to extinction, then both r i and α i j are independent of which fixed point is chosen (e.g., whether both species, only species i or only species j is present in the two-species case).We can therefore drop the extra index indicating which species is absent, and simply write r i and α i j as assumed in the simplified form of the coexistence condition (27).
These considerations suggest that the scaled quantity α i j /α ii is a natural definition of the "niche overlap" between species i and j.This definition is ambiguous, however, since it is not symmetric under exchange of i and j.If we symmetrize by taking the geometric mean, we can define the niche overlap ρ by With this definition, and recalling that the values of the GLV parameters do not depend on which species is the resident and which is the invader (as long as no resources go extinct), we can simplify the coexistence condition (26) into the more interpretable form [37, 69] In this context, stabilizing mechanisms are features of the ecosystem that reduce the niche overlap, opening up more space for species with different maintenance costs m i or consumption capacity β∈M * w β c iβ K β (the two components of r i ) to stably coexist.

Externally supplied resources
As noted above, a special feature of the MCRM is that the "intrinsic" growth rate r i and the competition coefficient α i j between species i and species j (Eq.34) can be fully determined by the properties of these two species and of the resources, regardless of which species or combination of species is currently resident.The one exception occurs when the addition of a third species drives one of the resources extinct, so that the set M * of resources to be summed over changes.This caveat is often ignored in the literature.
In other models, the context-dependence of the GLV parameters is much more apparent.For the case of externally supplied substitutable resources (Eq.11-12), solving for the resource equilibrium yields: Inserting this into the equation for the consumer dynamics (Eq.11) gives: This is no longer of the GLV form.But in the vicinity of the equilibrium state N * i , we can still use eq.(21)(22) to locally obtain the GLV parameters The contribution of each resource to the sums in both equations now depends on its equilibrium abundance R * α .This complicates the above analysis of stabilizing versus equalizing mechanisms, since in general it is no longer true that r i/ j ≈ r i/i .In communities with more than two species, this also generates effective higher-order interactions of all orders, since the interaction between any pair of species is affected by all the other species with overlapping consumption preferences.

Niche theory and optimization
Ecological models where species interact reciprocally (i.e.how species i affects species j is identical to how species j affects species i) can be recast in the language of optimization.This optimization perspective provides a powerful set of analytical tools.In the papers where MacArthur first presents his Consumer Resource Model, he already started to investigate the conditions under which this ecological model can be represented in terms of an analogous optimization principle [66,71].The framework laid out in this chapter makes it possible to extend his original results and state the general conditions under which an optimization principle exists [72].One the key lessons from these recent works is that conducting the optimization in environmental space -instead of in the space of population sizes -leads to a transparent ecological interpretation of the objective function.
In the following sections, we first review MacArthur's original result, where the optimization is performed over the space of population sizes in the GLV approximation to the MCRM.We then briefly describe the optimization problems in resource space solved by the equilibrium states of the CRMs introduced in Section 4. We encourage the reader to consult [72] for further details and additional examples.

MacArthur's Minimization Principle
MacArthur noticed that the GLV representation of the MCRM given above in eq. ( 20) and (33)(34) can be written in terms of the gradient of a quadratic function of the N i 's: with This is easily verified by performing the partial derivative and comparing with the original equation.Equation (41) implies that ∂ Q/∂ N i = 0 in equilibrium for all non-extinct populations i.The negative sign guarantees that this stationary point is a local minimum rather than a maximum.For the extinct populations, stability against re-invasion requires ∂ Q/∂ N i > 0. This means that setting N i = 0 also minimizes Q along these directions, subject to the feasibility constraint N i ≥ 0 [73,74].We have plotted Q(N) for a community with two consumer species in fig.4(a), along with the equilibrium state eventually reached in a numerical simulation of the full MCRM (eq.6-7).This result was an important step forward in understanding the nature of equilibrium states in this model.It implies, for example, that there is only one stable equilibrium state, since Q is a convex function with a single local minimum.But the objective function Q lacks an intuitive interpretation except in some special cases, and the essential assumptions required to obtain an optimization principle remained unclear.In the following sections, we show how the optimization perspective can be applied to a surprisingly large class of models, with a uniform physical interpretation in terms of the perturbation of the environment away from the supply point.The crucial assumption turns out to be the symmetry of the environmentallymediated interactions among species [72].In modern language, this is simply the statement that interactions between species are reciprocal.
This formulation of the minimization principle can also be extended to models without symmetry, although its practical implementation as a means of finding equilibrium state becomes more complicated (see [72] for detailed discussion).

MacArthur's Consumer Resource Model
We begin with MacArthur's original model, presented in eqs.(6-7) above.At steady-state, we have that where in the first line we have defined the growth rate of species i: To proceed, we rewrite the second of these equations in terms of the function It turns out that d(R 0 , R) is the natural measure of environmental perturbation and has a clear interpretation as the weighted Euclidean distance of the resource abundance vector R from the supply point R 0 (which in this case is simply equal to the vector of resource carrying capacities R 0 α = K α ).A straightforward calculation shows that we can rewrite Eq. 44 as If we in addition note that both species and resource abundances must be positive, the steady-state conditions can be summarized by the following four conditions: Feasible populations 0 These are identical to the well-known Karush-Kuhn-Tucker (KKT) conditions for constrained optimization under the constraints g i ≤ 0, with the scaled population sizes N i /e i playing the role of the generalized Lagrange multipliers (also called KKT multipliers), and with d(R, R 0 ) as the optimized function ( [75,76]).These conditions generalize the theory of Lagrange multipliers to the case of inequality constraints.
In particular, these four conditions are sufficient to show that the steady-state resource abundances R are solutions to the following constrained optimization problem min with R 0 = K.Since the resource dynamics in eq. ( 44) generally has two potential solutions: R α = 0 and R α > 0, we impose additional constraints R α ≥ 0 in the optimization process to find the steady-state solution with the maximum number of surviving resources.This optimization equation states that the contribution of each resource to the minimized "distance" encoded in d(R, R 0 ) is weighted by the ecological significance of changes in its abundance.This weight has three components.The first factor, w α , measures the nutritional value of the resources.Resources with low values of w α contribute less to the growth for consumer populations, and changes in their abundance are therefore less important.The second factor, r α , controls the rate of resource renewal.Abundances of resources with high rates of self-renewal are more difficult to perturb than those of resources that grow back slowly, and so a given shift in abundance is more significant for the former than for the latter.Finally, the factor of K −1 α reflects the fact that a perturbation of the same absolute size is less significant if the carrying capacity is larger.
In the constrained optimization problem, the species abundances N i are simply the generalized Lagrange multipliers (KKT multipliers) corresponding to the S inequality constraints g i (R) ≤ 0. Notice that the steady-state species condition (known as the K.K.T. condition in the constrained optimization literature) is simply the statement that N i e i g i (R) = 0.It states that species is either extinct (i.e.g i (R) < 0 meaning that N i = 0) or that a species survives at steady-state (i.e.g i (R) = 0 meaning that N i > 0).Furthermore, from the definition of Lagrange multipliers, we see that the species abundance N i can also be thought of as how much the objective d(R, R 0 ) changes if we infinitesimally change the growth rate g i (R) of species i.

Externally supplied resources
The environment-based perspective on optimization facilitates generalization to other models not considered by MacArthur, including the model of externally supplied susbstitutable resources discussed above in eq.(11)(12).For this model, the objective function is no longer quadratic, but is given by a weighted Kullback-Leibler (KL) divergence (see [72] for detailed derivation): where the supply point is given by R 0 α = κ α /ω.The constrained minimization of this function at equilibrium can easily be verified by taking the derivative and substituting into the KKT conditions as given above in eq.(48)(49)(50)(51).The KL divergence is a natural way of quantifying the difference between two vectors with all positive components, such as probabilities or chemical concentrations ( [77]).As in the original MacArthur model, the contribution of each resource is weighted by its nutritional value w α .But now the feasibility constraint R α ≥ 0 need not be enforced explicitly, because d(R 0 , R) diverges as R α → 0, guaranteeing that the constrained optimum will always lie in the feasible region.

Essential resources
We now turn to the case of nonsubstitutable resources with fixed biomass stoichiometry, as represented by eq.(13-14) above with n < −1.The environmentally mediated interactions among consumers in these models are non-reciprocal because organisms can affect the environment in ways that are unrelated to their own growth rate by consuming resource types that do not limit their growth.In particular, the limit n → −∞ ("Liebig's Law of the Minimum") gives the simplest qualitative example of this since each species has a single limiting resource that controls the growth rate and is insensitive to the concentrations of all the other resources.It can be shown that in this limit, the steady-state resource abundances minimize the same objective functions as above (eq.( 53)), but with the supply point R 0 replaced by an effective supply point R0 .The effective supply point R0 corrects the external supply vectors R 0 by accounting for depletion of non-limiting resources at steady state as follows: where the sum is over all species i that are limited by some resource β i other than α.The constrained minimization of d(R, R0 ) in equilibrium can be verified as before by simply substituting into the KKT conditions (48)(49)(50)(51).
While the notion of the effective supply point defined in eq. ( 54) allows the equilibrium conditions to be formally rewritten as an optimization problem, it also makes the objective function depend explicitly on the solution to the problem.A similar issue is encountered in machine-learning models with latent variables, where the most likely value of the latent variable is itself a function of the best-fit model parameters [78].In machine learning, this conundrum is solved through an iterative algorithm called Expectation Maximization (EM), where the model fitting is performed at each iteration with the current estimate of the most likely latent variable values, and then the latent variables are updated using the new model parameters.An analogous approach proves empirically successful on a range of asymmetric ecological models, including models of essential resources [72].

Ecology in high-dimensions 7.1 Making the leap to high-dimensions
Natural ecosystems often have an astounding degree of phylogenetic and physiological diversity.For example, microbial communities are extremely diverse, ranging from 500-1000 species in human guts [79] to over 10 3 species in marine ecosystems [80].Recent advances in DNA sequencing technologies makes it possible to measure microbial communities abundances at high resolution and has opened a precision era in microbial ecology [81,82].Understanding and modeling such complex datasets challenges current theories and analytical approaches.This has motivated the statistical physics community to start developing a theory of high-dimensional ecosystems.
In low dimensional ecosystems, carefully measuring the traits of each individual in the ecosystems is tractable.However, in high dimensions, this approach becomes intractable for both practical and theoretical reasons.Practically, it is impossible to characterize the preferences and interactions of hundreds of species.Theoretically, even if we could carry out this task, it becomes very difficult to make senes of the resulting high-dimensional dynamics.
A similar problem is encountered in physics when modeling systems such as gases which are composed of a large ensemble of particles.One pertinent example from statistical mechanics is the example of particles in a box.At any time t, the microstate of a system of N particles is described by 6N numbers, namely the positions ⃗ x i (t) and the momenta ⃗ p i (t), of the i = 1, . . .N particles.When N is small, the evolution of the microstate can be predicted precisely by solving Hamilton's equations for all particles.However, when N ≫ 1 ( e.g. for N ∼ 10 23 as is the case for a gas), the complexity of predicting the microstate is too high to be feasible.Instead, one is forced to describe the system statistically.Making an analogy between particles in a box and species in an ecosystem, this suggests that analytical approaches and theoretical insights derived from small ecosystems with a few species and resources may not scale up to large, diverse ecosystems.
We have stressed the practical difficulties in high dimension.Actually, taking the thermodynamic limit N → ∞ can also lead to a number of simplifications.As we all know, statistical mechanics provides a prescription for successfully dealing with systems with a very large number of degrees of freedom [83].For the particles in a box example, instead of trying to predict the behavior of individual particles, we can make accurate statements about the macroscopic quantities such as temperature and pressure which reflect averages over millions of particles.As a result of this averaging, macroscopic quantities often have universal behaviors.For example, the pressure, temperature, volume and entropy are related by Maxwell's relations regardless of if the gas particles being analyzed are oxygen, nitrogen or mixture of the two.This reflects the fact that in the thermodynamics limit, the behaviors we will observe will be typical.
The analogy between species in ecosystems and particles in box yields some general lessons from statistical mechanics that we can use to study high-dimensional ecology.First, we should concentrate of finding relationship between macroscopic properties of the ecosystem rather than the behaviors of any particular species or resource.Second, we should focus on characterizing the typical behaviors we would expect to see in diverse ecosystems.Finally, on a technical level, we must exploit the central limit theorem by averaging over species and resource traits to make predictions about ensembles of ecosystems rather than any particular ecosystem.

May's Stability Criteria and Random Matrix Theory
One of the first examples of the use of ideas from statistical mechanics to ecology was the pioneering work by physicist-turned-ecologist Robert May [84].May was inspired by Wigner's work on Uranium showing that one could recapitulate many aspects of experimental data by replacing the exact Uranium Hamiltonian by a large symmetric random matrix.The underlying reason for this was that the predicted properties of Uranium were typical.In other words, these properties did not depend on the detailed interactions of Uranium but instead were properties of any sufficiently complex system.Inspired by this, Wigner showed that one could model Figure 5: Schematic for May's stability criteria.The red scatter points are the eigenvalues on the complex plane.some aspects of complex Hamiltonians (i.e. the level statics) with large random matrices.
May transplanted these ideas to an ecological setting.He considered a general ecosystem with S species, with abundances N i , whose dynamics take the form He further assumed that the ecosystem had a fixed point given by N.He then asked about the nature of the dynamics around the fixed point.In this setting, the population dynamics can be described by Taylor expanding the dynamics around the equilibrium point N to get The question May asked is when S ≫ 1, when will the fixed point N be stable.From above equation, the stability of this equilibrium can be quantified in terms of the largest eigenvalue λ max of J i j .If λ max is positive, the equilibrium is unstable, and a small perturbation will cause the system to flow away from the equilibrium state.
In the 1960's, Jean Ginibre derived a mathematical formula for the distribution of eigenvalues in a special class of large random matrices [85].Girko's Circular Law states when the J i j are sampled independently from probability distributions with zero mean and variance σ 2 , in the limit S → ∞ its eigenvalues are uniformly distributed on a disk with radius r = Sσ in the complex plane, shown in Fig. 5.And thus, the largest eigenvalue λ max is at the boundary of this disk.
With this result in mind, May considered a simple ecosystem where each species inhibits itself, with J ii = −1, but different species initially do not interact with each other.This ecosystem is guaranteed to be stable for any level of diversity.He then examined how the stability is affected by adding randomly sampled interactions (i.e drawing the off-diagonal elements of J i j from a random distribution).From the arguments above, it is clear that the maximum eigenvalue λ max of J i j scales as λ max typically becomes positive when the root-mean-squared total strength Sσ 2 of interspecific interactions reaches parity with the intra-specific interactions.This gave rise to May's stability criterion for the maximum size S * of a stable ecosystem:  The species dynamics in eq. ( 60) are expressed as a factor graph.The edges are bidirectional and sampled from a Gaussian distribution.2. Add the "Cavity" species 0 as the perturbation.3. Sum the resource abundance perturbations from the "Cavity" species 0 at steady state and update the species abundance distribution to reflect the new steady state.4. Employing the central limit theorem and the non-negativity constraint, the species distribution is expressed as a truncated normal distribution.The susceptibility appearing in the species distribution is the self-consistency relation.
For a given pairwise interaction strength σ, this relation gives that the maximum number of species S * that an ecosystem can have before becoming unstable according to May.
As is clear from the argument above, a large number of assumptions were made in this derivation.Nonetheless, this result proved to be extremely influential.Before the 1970s, ecologists believed that diversity enhanced ecosystem stability.May's stability criteria challenged this idea and lead to what is now known as the "diversity-stability debate" [47].Since May's original publications, theorists showed how this criteria could be violated by changing the May's original assumptions, including but not limited to, adding biologically realistic correlation structures [86], modular structures [87], incorporating the dependence of the community matrix on population sizes [88], and considering high-order interactions Lotka-Volterra dynamics [89].

Cavity Method for Lotka-Volterra model
One major shortcoming of May's argument is that he considered purely linearized dynamics.Over the last few years, physicists have extended these ideas to more complex ecological models such including the Generalized Lotka-Volterra model (GLV) and the consumer resource models (CRMs).One crucial difference between May's linear dynamics and these latter two classes of models is that in both GLVs and CRMs, species can go extinct.This gives rise to additional self-organization due to species extinctions that fundamentally change the qualitative picture derived by May.We begin by outlining the cavity solution to the GLV.Our derivation closely follows [90], though we deploy what we consider to be more transparent notation.We consider a GLV described by S equations of the form Here g i are the intrinsic growth rates, r i are the carrying capacities and A i j encode inter-species interactions, with positive and negative values representing competition and mutualism interactions, respectively.We are interested in characterizing the steady-state solutions to these equations (i.e. d N i d t = 0) when the A i j are sampled from a random distribution.In other words we would like to characterize the statistical properties of {N j } that satisfy the fixed point equations 0 To do so, we will make use of the zero temperature cavity method [91].The cavity method has been successfully used in a wide variety of settings, ranging from spin glasses to computer science and combinatorial optimization.Over the last five or six years, there has been a concerted effort to develop and generalize the cavity method to solve ecological models with non-negative continuous variables (species and resource abundance) [34,90,92,93].
The basic idea behind the cavity method is to derive self-consistency equations by relating an ecosystem with S species to another ecosystem with S+1 species where the interaction coefficients A i j are drawn from the same random distribution (see Figure 6).In the thermodynamic limit S → ∞, the macroscopic observables of these two systems will be indistinguishable.In other words, statistical observables such as the first and second moment of species abundances 〈N 〉, N 2 in the two ecosystems, will be identical up to order ∼ 1/S corrections which become negligible in the thermodynamic limit.
Technically, we do this by adding a new "cavity" species 0 to the original ecosystem.The addition of this species can be viewed as a small perturbation to the original equilibrium state and hence can be calculated using perturbation theory.This allows us to derive self-consistency equations characterizing the observables of interest.In what follows, we assume that the A i j are sampled from a Gaussian distribution with mean A i j = µ S and variance var(A i j ) = σ 2 S .The Gaussian assumption is not essential.In fact, what is important is that the distribution is not long-tailed with finite first two moments give by the expression above.
In order to employ the cavity method, it is helpful to rewrite with a i j = 0 where −1 ≤ ρ ≤ 1 measures the amount of non-reciprocity between species-species interactions.In particular, ρ = 1 corresponds to the case of reciprocal interactions where species i affects species j the same way species j affects species i.The scaling in the mean and variance is chosen to keep the interaction term j A i j Nj in eq. ( 60) independent of system size.
for N 0 is negative.If we also require that the solution be noninvasible (i.e. that if a positive solution exist we choose that solution), we can solve for N 0 to get The max function simply takes the maximum value in the bracket.To leading order in 1/S, we can replace the a 0 j a i0 in the double sum by its expectation values so that S j,i=1 where we have defined the average trace of the susceptibility matrix Substituting this into Eq.70 , we get So far, we have not done anything out of the ordinary.However, at this point we will make use of the power of "self-averaging" and typicality.Instead, of characterizing the solutions to Eq. 70 for a particular choice of δr i and a i j , we will instead ask about the property of solutions "averaged" over the different realizations of these variables.In other words, we imagine solving Eq. 70 for many different choices of a i j and r i , where these variables are all drawn from the same distribution, and ask about the distribution of solutions of N 0 for these different choices.Rather than focus on the full distribution, we will characterize the distribution by three quantities: the fraction of times that N 0 is non-zero, φ N 0 , the mean value of the species abundance 〈N 0 〉, and the second moment of the species distribution 〈N 2 0 〉.Here, the brackets denote averages over different realizations of the random interactions a i j and δr i .
Since we are interested in the behavior of solutions over different realizations of the random interactions, we can use the central limit theorem to further simplify Eq. 70.It is clear that if we make a plot of the the sum S j=1 σa 0 j N j/0 in the numerator over different realizations of the a 0 j , it will be have like a Gaussian random variable with mean and variance where we have made use of Eq. 62 and defined the quantities Furthermore, by assumption δr i is a random normal variable with variance σ 2 r that is uncorrelated with a i j .Hence, the quantity S j=1 σa 0 j N i/0 + δr i is also a random normal variable with mean 0 and variance given by the sum of the variances of the individual terms: σ 2 r + σ 2 〈N 2 〉.We can combine these observations with eq. ( 73) to write an expression for the distribution of N 0 over different realizations of a i j and δr i .Namely, if we define z N to be a standard normal random variable with mean 0 and standard deviation 1, then the distribution of solutions for N 0 takes the form Notice that the the right hand argument in the max is just the formula for a Gaussian distribution with mean (r − µ 〈N 〉)/(1 − ρσ 2 ν) and variance Thus, the full distribution of N 0 given in Eq. 77 is a truncated Gaussian.A truncated Gaussian can be completely characterized by three quantities: the probability that the new species survives φ N 0 (i.e. the probability that N 0 > 0) and its first two moments, 〈N 0 〉 and 〈N 2 0 〉.Note that Eq. 77 still depends on three unknown quantities: the average trace of the susceptibility ν, as well as the first and second moments of the species distributions 〈N 〉 and 〈N 2 〉.We emphasize that 〈N 〉 and 〈N 2 〉 are moments over all species in an ecosystem for a single realization of the random parameters a i j and r i .The same holds true for ν.In order to make progress, we now invoke the idea of self-averaging.Namely, the statement that as the number of species S goes to infinity, the statistical distribution characterizing N i for single realization of the random parameters should be identical to the distribution characterizing N 0 over many different realizations of the random parameters.In the language of the statistical physics of disordered systems, we focus on the replica symmetric solution.
Self-averaging and replica symmetry follow from the assumptions that: (i) the new species we have introduced is statistically indistinguishable from the original S species and (ii) as S gets large, we can think of the entire ecosystem as being composed of smaller ecosystems, each with a different independent realization of the random parameters.As we will briefly discuss below, the replica symmetric solution is not always valid and more complicated things can happen, especially as one increases the variance of the underlying distribution from which the parameters are drawn.However, the replica-symmetric solution always serves as a good starting point for understanding the behavior of high-dimensional ecosystems.
Assuming replica symmetry and self-averaging, we can equate expectations of N 0 computed using the truncated Gaussian distribution in Eq. 77, with averages calculated over all species from a single realization.This allows us to derive self-consistency equations of the form: where in the last line we have used the definition of susceptibility Eqs.66 and 72.Explicitly differentiating Eq. 77 with respect to r 0 , this las self-consistency relation yields where φ N = S * S is the fraction of nonzero species in the ecosystem.Note that φ N results from the fact that if N 0 = 0, the corresponding derivative is zero.
In order to write the first three self-consistent equations more explicitly, it is helpful to define the following notation.Let y = max 0, c b z + a b , with z being a Gaussian random variable with zero mean and unit variance.Then its j-th moment is given by where in the last line we have defined the functions In terms of the w j , the self-consistency equations take the form These self-consistency equations can be solved numerically in Mathematica.Fig. 7 and Fig. 8 show the comparison between the cavity solution and 500 independent numerical simulations for various ecosystem properties for symmetric (ρ = 1) and uncorrelated (ρ = 0) interactions A i j .As can be seen in the figures, our analytic expressions agree remarkably well over a large range of σ c .Notice, for the ecosystem with symmetric interactions (Fig. 7), the numerics and analytic expression start to disagree for σ c ∼ 0.5.

Beyond Replica Symmetry
The underlying reason for the disagreement between the analytic predictions and the numerical simulations for ecosystems with symmetric interactions is that the replica symmetric solution becomes unstable for σ c ∼ 0.5, indicating a phase transition to the "replica-symmetry broken" phase.The breaking of replica symmetry is actually closely related to May's original arguments about stability of ecosystems.
To see this, it is helpful to recall that one important criteria for thinking about whether a system is stable is to ask how sensitively it responds to perturbations in external parameters.We emphasize that this criteria is closely related to, but different from, asking about the stability of ecosystems to perturbing the species abundances.It is conceptually helpful to consider this distinction in greater mathematical detail.To do so, we will focus only on the S * surviving species, rather than all S original species in the regional species pool.From Eq. 60, we know at steady-state we must have (where i runs over the S * surviving species) where A * i j denotes the S * ×S * interaction sub-matrix of the full interaction matrix A i j restricted to surviving species and the index i runs over the S * surviving species.
Then, the stability to perturbations in species abundances (the kind of stability considered by May), can be calculated by linearizing this equation to get an equation for the deviations δN i from the steady-state abundances N i : where δ ik is the Kronecker Delta function (i.e. S * × S * identity matrix).This is the analogue of Eq. 56.Notice that this stability is set by the eigenvalues of the matrix −N i A * i j which unlike the case May considered depends on the steady-state abundances at the steady-state.
For this reason, it will be helpful to work with a slightly different quantity which ask about the sensitivity of steady-state abundances N i to small changes in the external parameters r j .This is precisely the susceptibility, Eq. 66, but now restricted to surviving species.We denote this restricted susceptibility by ν * i j .We can solve for it by dividing Eq. 86 by N i and then differentiating by r j to get where the Kronecker-delta function δ * i j comes from the fact that It turns out that RS breaking corresponds to the case where A * i j develops a zero eigenvalue, indicating that the susceptibility diverges.In other words, the system becomes infinitely sensitive to external perturbations.Fig. 7 (D) shows the minimum eigenvalue of A * i j as σ c is varied.The minimum eigenvalue decrease monotonically with increasing σ c until it is close to zero.And then the cavity solution fails.This correspond to the emergence of multiple attractors (MA phase).Further increasing σ c results in unbounded growth.The full phase diagram and consequences of this observations were first worked out in [90].To analyze these phases, one can recast the problem using the replica approach with replica symmetry breaking [94].
One of the most striking things about the GLV model is that the minimum eigenvalue of A * i j gets pinned exactly to zero over an extended region of the parameter space (region indicated by MA in Fig. 7 (D)).This corresponds to the phenomena of marginal stability.In other words, though the system is not unstable, it is infinitely sensitive to small changes in external parameters.This marginal stability is unique to the symmetric GLV model and is an indication that the ecosystem self-organizes through species extinction.Generically with non-reciprocal interactions, the marginal stable phase disappears and is replaced by a dynamic phase that can exhibit chaotic fluctuations [25][26][27][28][29][30].

Cavity Method for MacArthur Consumer-Resource Models
Just as we can use the cavity method to analyze the Lotka-Volterra model in high-dimensions, we can also adapt the cavity method to analyze consumer resource models (CRMs) in high dimensions.We will be interested in developing a statistical understanding of CRMs in the limit where the number of resources M and the number of species S in the regional species is  1.The initial parameter information consists of the probability distributions for the mechanistic parameters: K α , m i and C iα .We assume they can be described by their first and second moments.90) are expressed as a factor graph. 3. Add the "Cavity" species 0 as the perturbation.4. Sum the resource abundance perturbations from the "Cavity" species 0 at steady state and update the species abundance distribution to reflect the new steady state.5. Employing the central limit theorem and the non-negativity constraint, the species distribution is expressed as a truncated normal distribution.6. Repeat Step 2-4 for the resources.7. The resource distribution is also expressed as a truncated normal distribution.8.The self-consistency equations are obtained from the species and resource distributions.large.In these lectures, we limit our analysis to MacArthur's original model ( eq. ( 6) and eq.( 7)).For simplification, we set w α = 1 and r α = K α and get the following dynamics:

The species dynamics
The basic idea behind the cavity method is to derive self-consistency equations by relating an ecosystem with S species and M resources to another ecosystem with S + 1 species and M + 1 species.Unlike in the GLV model were we just had to one new species to the ecosystem, here it is important that we add a new species N 0 and new resource R 0 to the ecosystem (see Figure 9).We once again will work in the thermodynamic limit, S, M → ∞, but with the ratio S/M = γ −1 fixed.As for the GLV model, we further assume that the system is self-averaging and can be described using a "replica symmetric" ansatz.With these assumptions, just as in the GLV model, we will use perturbation theory to relate solutions in the (S, M )-ecosystem to the (S + 1, M + 1)-ecosystem.The logic behind the cavity solution of the MacArthur CRM is very similar to the derivations for the GLV model and for this reason this section is more condensed than the rest of these lectures.
The basic steps needed to derive the cavity solution are shown in Fig. 9 and outlined below.
Step 1: the consumer preferences, c iα , are drawn from a Gaussian distribution with mean µ/M and variance σ 2 c /M .They can be deposed into a deterministic and fluctuating component: where the fluctuating part d iα obeys the statistics We also assume that both the carrying capacity K α and the minimum maintenance cost m i are independent Gaussian random variables with mean and covariance given by We also define 〈R〉 = 1 M β R β and 〈N 〉 = 1 S j N j to be the average resource and average species abundance, respectively.In terms of these quantities, we can re-write eq. ( 90) and eq.( 91) as where In Step 2, we express eq.( 100) as a bipartite factor graph model for visualization.At step 3, we add a "cavity" species N 0 and a "cavity" resource R 0 into the ecosystem, Adding new species and resource will perturb the original steady state.To characterize the perturbations, we introduce the following susceptibility matrices: We can express the steady-state species and resource abundances in the (S + 1, M + 1) system with a first-order Taylor expansion around the (S, M ) values.To do so, notice that in the presence of the new species Since in eq. ( 92), the contribution of the mean, and (99) are given by σ c d i0 R 0 and σ c d 0α N 0 , respectively.Namely, in the presence of the new species (to leading order in 1/M or 1/S) one has Note j/0 and β/0 mean the sums exclude the new species 0 and the new resource 0. These are just the analogue of Eqs.67 for the GLV model The next step is to plug eq. ( 106) and eq.( 107) into eq.( 101) and eq.( 102) and solve for the steady-state value of N 0 and R 0 .

Self-consistency equations for species
For the new "cavity" species, the steady equation takes the form Notice that each of the sums in this equation is the sum over a large number of uncorrelated random variables, and can therefore be well approximated by Gaussian random variables for large enough M and S.
As in the cavity solution of the GLV, we can replace the double sums over d j0 d 0β and d 0α d 0β by their expectation values using eq.( 93) and eq.( 94): where is the average susceptibility.Substituting these expressions into eq.( 109) , we obtain Instead of looking for a solution of N 0 for one realization of the random parameters, we can ask about the statistical distribution of the solution space for many different realizations of the random parameters.In this case, we know that β/0 σ c d 0β R β/0 can be well described by a Gaussian with mean and variance where q R is the second moment of the resource distribution, Introducing an auxiliary random normal variable z N with zero mean and unit variance, this allows us to replace the sum below with a random variable as follows: where we have used the fact that by assumption δm 0 is uncorrelated with d 0β and R β/0 .We can solve eq. ( 111) in terms of the quantities just defined to get Rearranging this to solve for N 0 and noting that N 0 ≥ 0, one gets which is the formula for a truncated Gaussian.In writing this solution, we have enforced the condition that, if possible, N 0 must be positive.Ecologically, this is the statement that we are interested in the statistical description of uninvasible fixed points.Combining eq. ( 115) and eq.( 81), by invoking self-averaging and replica symmetry we can easily write down the self-consistency equations for the fraction of non-zero species as well as the moments of the species abundances at the steady state:

Self-consistency equations for resource
We now derive the equations for the steady-state of the resource dynamics.Inserting eq. ( 107) into eq.( 102) gives: where γ −1 = S M .To proceed, we note in the double sums above we can replace the sums over the d i0 d j0 and d j0 d 0β by their expectation values using eq.( 93) and eq.( 94) to get i/0, j/0 Furthermore, notice that σ c d 00 N 0 is of order 1/ M and can be ignored in the thermodynamic limit.Finally, since we are interested in the statistical properties when we average over random parameters we can replace the sum over N j/0 by Gaussian random variable of the form where z R is unit normal random variable and q N = N 2 = 1 S j N 2 j .In writing this, we have once again used the fact that δK 0 is uncorrelated with d j0 and N j/0 .
We can substitute these equations into eq.( 120) and solve for R 0 .If we further require that the resource are uninvasible (i.e.we always choose the positive solution for R 0 if it exists), then the distribution of R 0 is described by a truncated Gaussian of the form This allows us to write down the self-consistency equations for the fraction of non-zero resources as well as the moments of their abundances at the steady state: 〈R〉= One can also use these equations to solve for the trace of the susceptibilities.The susceptibilities are given by averaging ν N ii and χ R αα , Solving above two equations yields Collectively Eqns 116 -118, 124-126, and 129 define 8 self-consistency equations for the 8 quantities φ N , 〈N 〉, 〈N 2 〉, φ R , 〈R〉, 〈R 2 〉, χ, and ν that can be solved numerically.
Fig. 10 shows a comparison between the cavity solution and 1000 independent numerical simulations for various ecosystem properties such as the fraction of surviving species S * /S, the fraction of surviving resources M * /M , and the first and second moment of the species and resource distributions.As can be seen in the figure, our analytic expressions agree remarkably well over a large range of σ c .M and the first and second moments of the species and resources distributions as a function of σ c .The error bar shows the standard deviation from 100 numerical simulations with M = S = 100 µ = 1., K = 1., σ K = 0.1, m = 0.1, σ m =0.01.Simulations were run using the CVXPY package [95].
As a further check on our analytic solution, we ran simulations where the c iα were drawn from different distributions.One pathology of choosing c iα from a Gaussian distribution is that when σ c is large, many of consumption coefficients are negative.Another peculiarity of a Gaussian is that every species interacts with every other species To test whether our cavity solution still describes ecosystems when c iα are strictly positive or sparse, we compare our cavity solution to simulations where the c iα are drawn from either a Bernoulli or uniform distribution.The results are shown in Figs.11 and 12.As before, there is remarkable agreement between theoretical predictions and numerical simulations.The fact that our analytics also describe simulations from these alternative distributions highlights the incredible power of the central limit theorem and the cavity method.In fact, we expect that many properties diverse ecosystems can be modeled using random ecosystems even when the consumer preferences have considerable structure, an idea explored using the cavity method and random matrix theory here [96].

Conclusions
The goal of these lectures was to familiarize the reader with recent work at the intersection of statistical physics and community ecology.The bulk of the lectures focused on the two main models of community ecology: the generalized Lotka-Volterra model (GLV) and MacArthur's Consumer Resource Models (MCRM).We first presented these models in low-dimensions, where graphical methods could be used to develop intuition and understanding.We then discussed the deep connection between ecosystems with reciprocal interactions and constrained optimization before proceeding to analyze these same models in "high-dimensions" using methods from the statistical physics.
We hope that these lectures can serve as a compact point of reference for those interested in community ecology.While many of the results presented here are scattered in the literature, it has been our experience there is a dearth of comprehensive pedagogical resources for beginners.It is our sincere hope that these lectures can fill this void.As emphasized in the introduction, in no way are these lectures comprehensive.Instead, they should be viewed as

Acknowledgments
We would like to thank the Mehta group (especially Zhijie "Sarah" Feng and Emmy Blumenthal) and our collaborators for countless conversations and insights.We are extremely grateful to Arvind Murugan for his deep engagement with the work presented here, including critically reading an early version of these lectures and his many suggestions for improving the presentation.We would also like to thank Akshit Goyal for helpful comments on the presentation.This work was supported by NIH NIGMS R35GM119461 and a CZI Theory grant to PM.

A Consumer resource models
While all consumer resource models are broadly based on one idea -consumer growth depends on resource availability and consumer growth impacts resource availability, the mathematical forms can vary significantly based on detailed assumptions made about the nature of consumer growth, resource consumption by consumers and resource renewal.Although in principle g i , q iα and f α are chosen from an infinite variety of possible functions, in practice there are only a few forms that are commonly used.We will consider two possibilities for each, generating eight possible models, tabulated in Table 1.

A.1 Substitutable Resources
For substitutable resources, the growth rate takes the form where c iα is a set of foraging preferences, w α represents the value of each resource in the "common currency" (e.g., energy), and m i is a minimal amount of consumption required to sustain growth.

A.2 Nonsubstitutable Resources
A general form for the growth rate with nonsubstitutable resources was given by Schreiber and Tobiason (2003): where the exponent n can be any real number.Different values of n correspond to different ecological categories of resources, as shown in

A.3 Mass action
The simplest and most common way of modeling the impact vectors is to assume mass action kinetics, where the depletion rate of resource α by consumer species i is proportional to the rate of encounters between that resource and that species.This gives: for some set of constants v iα with units of [consumer density] −1 ×[time] −1 .

A.4 Marginal benefit foraging
(see [69], p. 171) The resource depletion rate in the mass action model is not related in any definite way to the growth rates.But in real systems, these two quantities are clearly related, because the growth is sustained by uptake of resources.One way of modeling this relationship is with marginal benefit foraging, where the relative foraging effort invested in each resource type is proportional to the partial derivative of the growth rate with respect to the (value-weighted) abundance of that resource.Instead of arbitrary constants v iα determining the encounter rates between consumers and resources, the rates are now tied to the growth law by the following formula: where a i is a constant that sets the overall encounter rate, which we will always take to be equal to 1.

A.5 Self-renewing resources
In MacArthur's original consumer resource model, he considered biotic resources such as plants serving as food for herbivores, and modeled the intrinsic resource dynamics with a logistic law: where K α is the carrying capacity and r α is the low-density growth rate.

A.6 Externally supplied resources
When the resources are not alive, it is usually more physically reasonable to model them as diffusing in from some external source at some rate κ α , and decaying or diluting at a fixed rate ω α , leading to the linear law A.7 Explicit equations for all models A.7.1 Substitutable, Mass action, Self-renewing A.7.8 Nonsubstitutable, Marginal benefit, Externally supplied

B Self-limitation and stability
We can derive conditions for stability in the range of models described above.While the detailed analysis differs from case to case, qualitatively, the stability criterion is that each species must limit itself more than it limits others.The typical formulation of this in the niche theory framework is that each species consumes most of the resource that is most limiting to it.Here we generalize this statement to arbitrary numbers of species and resources.
Here, we first consider substitutable resources but different resource replenishment dynamics and impact vectors q iα .We show that non-substitutable resources can be handled by locally expanding the contours We fill focus on the basic example of MCRM with two resources and two species, but allowing an arbitrary impact vector v iα R α that need not be related to c iα : The easiest thing to do is move to the "fast resource limit" (see below) and check the stability of the competition matrix.Following the derivation from the main text, and assuming no resources go extinct, we obtain: For two species and two resources, we can explicitly compute the eigenvalues, finding: where we have defined the inner product and |c| is the determinant of the matrix c iα .We see that all the eigenvalues are positive, implying stability, only when |c| and |v| have the same sign: Writing out the determinants explicitly, we have: where the θ 's denote the angle that the consumption vector and depletion vector of each species make with the resource 1 axis.The stability criterion thus implies that the species whose depletion vector v iα makes a bigger angle with the resource 1 axis must also have the biggest angle for its consumption vector.If we say that the species with a larger θ c i is limited by resource 2, while the one with a smaller θ c i is limited by resource 1, then the equilibrium is stable whenever the depletion from each species is more biased towards its own limiting resource than the depletion from the other species is.But both depletion vectors could very well be extremely biased towards the same resource -it's just the order that matters.And the magnitude of the impact doesn't matter at all, just the relative angle.\bibliography{references}

Figure 2 :
Figure 2: The metacommunity framework.(a) The metacommunity framework incorporates the three fundamental ecological processes of selection, dispersal and drift within a unified mathematical framework.(b) Within this framework, three areas of intense study have been niche theory, generalized Lotka-Volterra theory and neutral theory.The domains of applicability for these theories partially overlap.

Figure 3 :
Figure 3: Niche Theory Examples.(a,b,c) Different resource supply choices with substitutable resources yield different coexistence cone geometries.(d,e,f) Stability depends on relative orientation of impact vectors.

Figure 4 :
Figure 4: Reinterpreting MacArthur's Minimization Principle.(a) Contour lines of MacArthur's objective function Q(N), in the space of population sizes, as defined in full in eq.(41) and(42).The 'x' marks the equilibrium eventually attained in direct numerical simulation of eq.(6-7) with the same parameters (r α = m i = w α = 1 for α = 1, 2, i = 1, 2; K 1 = 4.8, K 2 = 2.85, c 1α = (0.5, 0.3), c 2α = (0.4,0.6)).The direct simulation ends up at the point where Q is minimized, as predicted by MacArthur.Also shown for illustration is a simplified expression for Q with the r α and w α set to 1.(b) Contour lines in the environmental space of resource abundances, representing the dissimilarity measure d(R 0 , R) with respect to the supply point R 0 .The uninvadable equilibrium state, indicated by the black dot, minimizes d under the uninvadability constraint g i (R) ≤ 0, which constrains the environmental state to lie within the shaded region Ω bounded by the zero net-growth isoclines (ZNGI's, colored lines).For MacArthur's model of competition for noninteracting resources, with r α = w α = 1 as in the previous panel, d is simply the Euclidean distance.

Figure 6 :
Figure 6: Schematic outlining steps in cavity solution for Lotka-Volterra model.1.The species dynamics in eq.(60) are expressed as a factor graph.The edges are bidirectional and sampled from a Gaussian distribution.2. Add the "Cavity" species 0 as the perturbation.3. Sum the resource abundance perturbations from the "Cavity" species 0 at steady state and update the species abundance distribution to reflect the new steady state.4. Employing the central limit theorem and the non-negativity constraint, the species distribution is expressed as a truncated normal distribution.The susceptibility appearing in the species distribution is the self-consistency relation.

Figure 7 :
Figure 7: Comparison between the cavity solution (equation 83 -85) and simulations for symmetric interactions (ρ = 1), (a) the fraction of surviving species φ N = S * S and (b) the first moment 〈N 〉 and (c) the second moment N 2 of the species distributions as a function of σ c .(d) The minimum eigenvalue of the submatrix A * i j at at different σ c .The error bar shows the standard deviation from 500 numerical simulations with S = 200, µ = 2., r = 1., σ r = 0.1 and ρ = 1.The black solid lines separate the results in three different regimes: unique fixed point, multiple attractors and unbound growth.The black dashed line in Panel (d) is the base line λ = 0 for comparison.

Figure 8 :
Figure 8: Comparison between the cavity solution (equation 83 -85) and simulations for uncorrelated interactions, ρ = 0, (a) the fraction of surviving species φ N = S * S and (b) the first moment 〈N 〉 and (c) the second moment N 2 of the species distributions as a function of σ c .(d) The minimum eigenvalue of the submatrix A * i j at at different σ c .The error bar shows the standard deviation from 500 numerical simulations with S = 200, µ = 2., r = 1., σ r = 0.1 and ρ = 0.

Figure 9 :
Figure 9: Schematic outlining steps in cavity solution.1.The initial parameter information consists of the probability distributions for the mechanistic parameters: K α , m i and C iα .We assume they can be described by their first and second moments.2.The species dynamicsN i ( α C iα R α − m i ) in eqs.(90) are expressed as a factor graph. 3. Add the "Cavity" species 0 as the perturbation.4. Sum the resource abundance perturbations from the "Cavity" species 0 at steady state and update the species abundance distribution to reflect the new steady state.5. Employing the central limit theorem and the non-negativity constraint, the species distribution is expressed as a truncated normal distribution.6. Repeat Step 2-4 for the resources.7. The resource distribution is also expressed as a truncated normal distribution.8.The self-consistency equations are obtained from the species and resource distributions.

µM
∼ O( 1 M ), is much smaller than the fluctuation term, σ c d i0 ∼ O( 1 M), to leading order the perturbations to m i , and K α in eqs.(100)

2 >Figure 10 :
Figure 10: Comparison between cavity solutions (see main text for definition) and simulations for the fraction of surviving species φ N = S * S , the fraction of surviving species φ R = M *M and the first and second moments of the species and resources distributions as a function of σ c .The error bar shows the standard deviation from 100 numerical simulations with M = S = 100 µ = 1., K = 1., σ K = 0.1, m = 0.1, σ m =0.01.Simulations were run using the CVXPY package[95].

Table 1 :
Table of modeling choices a good starting point for delving into this rich and exciting research area.

Table 2 (
cf. Koffel 2016).The weights w α are typically set to 1, but are included so that this model reduces exactly to the above substitutable model when n = 1.

Table 2 :
Taxonomy of resource interactions